智能论文笔记

Country-wide Retrieval of Forest Structure From Optical and SAR Satellite Imagery With Bayesian Deep Learning

Alexander Becker , Stefania Russo , Stefano Puliti , Nico Lang , Konrad Schindler , Jan Dirk Wegner

分类：计算机视觉 | 机器学习

2021-11-25

以知情方式监测和管理地球林是解决生物多样性损失和气候变化等挑战的重要要求。虽然森林评估的传统或空中运动提供了在区域一级分析的准确数据，但将其扩展到整个国家，以外的高度分辨率几乎不可能。在这项工作中，我们提出了一种贝叶斯深度学习方法，以10米的分辨率为全国范围的森林结构变量，使用自由可用的卫星图像作为输入。我们的方法将Sentinel-2光学图像和Sentinel-1合成孔径雷达图像共同变换为五种不同的森林结构变量的地图：95th高度百分位，平均高度，密度，基尼系数和分数盖。我们从挪威的41个机载激光扫描任务中培训和测试我们的模型，并证明它能够概括取消测试区域，从而达到11％和15％之间的归一化平均值误差，具体取决于变量。我们的工作也是第一个提出贝叶斯深度学习方法的工作，以预测具有良好校准的不确定性估计的森林结构变量。这些提高了模型的可信度及其适用于需要可靠的信心估计的下游任务，例如知情决策。我们提出了一组广泛的实验，以验证预测地图的准确性以及预测的不确定性的质量。为了展示可扩展性，我们为五个森林结构变量提供挪威地图。

translated by 谷歌翻译

2-hop Neighbor Class Similarity (2NCS): A graph structural metric indicative of graph neural network performance

Andrea Cavallo , Claas Grohnfeldt , Michele Russo , Giulio Lovisotto , Luca Vassio

分类：机器学习

2022-12-26

Graph Neural Networks (GNNs) achieve state-of-the-art performance on graph-structured data across numerous domains. Their underlying ability to represent nodes as summaries of their vicinities has proven effective for homophilous graphs in particular, in which same-type nodes tend to connect. On heterophilous graphs, in which different-type nodes are likely connected, GNNs perform less consistently, as neighborhood information might be less representative or even misleading. On the other hand, GNN performance is not inferior on all heterophilous graphs, and there is a lack of understanding of what other graph properties affect GNN performance. In this work, we highlight the limitations of the widely used homophily ratio and the recent Cross-Class Neighborhood Similarity (CCNS) metric in estimating GNN performance. To overcome these limitations, we introduce 2-hop Neighbor Class Similarity (2NCS), a new quantitative graph structural property that correlates with GNN performance more strongly and consistently than alternative metrics. 2NCS considers two-hop neighborhoods as a theoretically derived consequence of the two-step label propagation process governing GCN's training-inference process. Experiments on one synthetic and eight real-world graph datasets confirm consistent improvements over existing metrics in estimating the accuracy of GCN- and GAT-based architectures on the node classification task.

translated by 谷歌翻译

Annual field-scale maps of tall and short crops at the global scale using GEDI and Sentinel-2

Stefania Di Tommaso , Sherrie Wang , Vivek Vajipey , Noel Gorelick , Rob Strey , David B. Lobell

分类：计算机视觉 | 机器学习

2022-12-19

Crop type maps are critical for tracking agricultural land use and estimating crop production. Remote sensing has proven an efficient and reliable tool for creating these maps in regions with abundant ground labels for model training, yet these labels remain difficult to obtain in many regions and years. NASA's Global Ecosystem Dynamics Investigation (GEDI) spaceborne lidar instrument, originally designed for forest monitoring, has shown promise for distinguishing tall and short crops. In the current study, we leverage GEDI to develop wall-to-wall maps of short vs tall crops on a global scale at 10 m resolution for 2019-2021. Specifically, we show that (1) GEDI returns can reliably be classified into tall and short crops after removing shots with extreme view angles or topographic slope, (2) the frequency of tall crops over time can be used to identify months when tall crops are at their peak height, and (3) GEDI shots in these months can then be used to train random forest models that use Sentinel-2 time series to accurately predict short vs. tall crops. Independent reference data from around the world are then used to evaluate these GEDI-S2 maps. We find that GEDI-S2 performed nearly as well as models trained on thousands of local reference training points, with accuracies of at least 87% and often above 90% throughout the Americas, Europe, and East Asia. Systematic underestimation of tall crop area was observed in regions where crops frequently exhibit low biomass, namely Africa and South Asia, and further work is needed in these systems. Although the GEDI-S2 approach only differentiates tall from short crops, in many landscapes this distinction goes a long way toward mapping the main individual crop types. The combination of GEDI and Sentinel-2 thus presents a very promising path towards global crop mapping with minimal reliance on ground data.

translated by 谷歌翻译

Understanding Online Migration Decisions Following the Banning of Radical Communities

Giuseppe Russo , Manoel Horta Ribeiro , Giona Casiraghi , Luca Verginer

分类：自然语言处理

2022-12-09

The proliferation of radical online communities and their violent offshoots has sparked great societal concern. However, the current practice of banning such communities from mainstream platforms has unintended consequences: (I) the further radicalization of their members in fringe platforms where they migrate; and (ii) the spillover of harmful content from fringe back onto mainstream platforms. Here, in a large observational study on two banned subreddits, r/The\_Donald and r/fatpeoplehate, we examine how factors associated with the RECRO radicalization framework relate to users' migration decisions. Specifically, we quantify how these factors affect users' decisions to post on fringe platforms and, for those who do, whether they continue posting on the mainstream platform. Our results show that individual-level factors, those relating to the behavior of users, are associated with the decision to post on the fringe platform. Whereas social-level factors, users' connection with the radical community, only affect the propensity to be coactive on both platforms. Overall, our findings pave the way for evidence-based moderation policies, as the decisions to migrate and remain coactive amplify unintended consequences of community bans.

translated by 谷歌翻译

CT-DQN: Control-Tutored Deep Reinforcement Learning

Francesco De Lellis , Marco Coraggio , Giovanni Russo , Mirco Musolesi , Mario di Bernardo

分类：机器学习

2022-12-02

One of the major challenges in Deep Reinforcement Learning for control is the need for extensive training to learn the policy. Motivated by this, we present the design of the Control-Tutored Deep Q-Networks (CT-DQN) algorithm, a Deep Reinforcement Learning algorithm that leverages a control tutor, i.e., an exogenous control law, to reduce learning time. The tutor can be designed using an approximate model of the system, without any assumption about the knowledge of the system's dynamics. There is no expectation that it will be able to achieve the control objective if used stand-alone. During learning, the tutor occasionally suggests an action, thus partially guiding exploration. We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing. We demonstrate that CT-DQN is able to achieve better or equivalent data efficiency with respect to the classic function approximation solutions.

translated by 谷歌翻译

On the Sample Complexity of Representation Learning in Multi-task Bandits with Global and Local structure

Alessio Russo , Alexandre Proutiere

分类： (统计)机器学习 | 人工智能 | 机器学习

2022-11-28

We investigate the sample complexity of learning the optimal arm for multi-task bandit problems. Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor). The objective is to learn the optimal (representation, predictor)-pair for each task, under the assumption that the optimal representation is common to all tasks. Within this framework, efficient learning algorithms should transfer knowledge across tasks. We consider the best-arm identification problem for a fixed confidence, where, in each round, the learner actively selects both a task, and an arm, and observes the corresponding reward. We derive instance-specific sample complexity lower bounds satisfied by any $(\delta_G,\delta_H)$-PAC algorithm (such an algorithm identifies the best representation with probability at least $1-\delta_G$, and the best predictor for a task with probability at least $1-\delta_H$). We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(G\log(1/\delta_G)+ X\log(1/\delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors. By comparison, this scaling is significantly better than the classical best-arm identification algorithm that scales as $HGX\log(1/\delta)$.

translated by 谷歌翻译

Analysis and Detectability of Offline Data Poisoning Attacks on Linear Systems

Alessio Russo , Alexandre Proutiere

分类：机器学习

2022-11-16

In recent years, there has been a growing interest in the effects of data poisoning attacks on data-driven control methods. Poisoning attacks are well-known to the Machine Learning community, which, however, make use of assumptions, such as cross-sample independence, that in general do not hold for linear dynamical systems. Consequently, these systems require different attack and detection methods than those developed for supervised learning problems in the i.i.d.\ setting. Since most data-driven control algorithms make use of the least-squares estimator, we study how poisoning impacts the least-squares estimate through the lens of statistical testing, and question in what way data poisoning attacks can be detected. We establish under which conditions the set of models compatible with the data includes the true model of the system, and we analyze different poisoning strategies for the attacker. On the basis of the arguments hereby presented, we propose a stealthy data poisoning attack on the least-squares estimator that can escape classical statistical tests, and conclude by showing the efficiency of the proposed attack.

translated by 谷歌翻译

Explainability in Practice: Estimating Electrification Rates from Mobile Phone Data in Senegal

Laura State , Hadrien Salat , Stefania Rubrichi , Zbigniew Smoreda

分类：机器学习

2022-11-11

Explainable artificial intelligence (XAI) provides explanations for not interpretable machine learning (ML) models. While many technical approaches exist, there is a lack of validation of these techniques on real-world datasets. In this work, we present a use-case of XAI: an ML model which is trained to estimate electrification rates based on mobile phone data in Senegal. The data originate from the Data for Development challenge by Orange in 2014/15. We apply two model-agnostic, local explanation techniques and find that while the model can be verified, it is biased with respect to the population density. We conclude our paper by pointing to the two main challenges we encountered during our work: data processing and model design that might be restricted by currently available XAI methods, and the importance of domain knowledge to interpret explanations.

translated by 谷歌翻译

Social Assistive Robotics for Autistic Children

Stefania Brighenti , Federico Buratto , Fernando Vito Falcone , Cristina Gena , Claudio Mattutino , Matteo Nazzario

分类：机器人 | 人工智能

2022-09-25

本文介绍了针对自闭症儿童的社会辅助机器人技术，旨在使用机器人疗法进行自闭症。该项目的目的是测试自闭症儿童与社会机器人NAO的互动。特别是机器人将在其工作中支持操作员（心理学家，教育者，言语治疗师等）。该项目的创新方面是，儿童机器人互动将考虑孩子的情绪和特定特征，机器人将相应地适应其行为。

translated by 谷歌翻译

Application of the nnU-Net for automatic segmentation of lung lesion on CT images, and implication on radiomic models

Matteo Ferrante , Lisa Rinaldi , Francesca Botta , Xiaobin Hu , Andreas Dolp , Marta Minotti , Francesca De Piano , Gianluigi Funicelli , Stefania Volpe , Federica Bellerba

分类：计算机视觉

2022-09-24

病变分割是放射线工作流程的关键步骤。手动分割需要长时间的执行时间，并且容易发生可变性，从而损害了放射线研究及其鲁棒性的实现。在这项研究中，对非小细胞肺癌患者的计算机断层扫描图像进行了深入学习的自动分割方法。还评估了手动与自动分割在生存放射模型的性能中的使用。方法总共包括899名NSCLC患者（2个专有：A和B，1个公共数据集：C）。肺部病变的自动分割是通过训练先前开发的建筑NNU-NET进行的，包括2D，3D和级联方法。用骰子系数评估自动分割的质量，以手动轮廓为参考。通过从数据集A的手动和自动轮廓中提取放射性的手工制作和深度学习特征来探索自动分割对患者生存的放射素模型对患者生存的性能的影响。评估并比较模型的精度。结果通过平均2D和3D模型的预测以及应用后处理技术来提取最大连接的组件，可以实现具有骰子= 0.78 +（0.12）的自动和手动轮廓之间的最佳一致性。当使用手动或自动轮廓，手工制作或深度特征时，在生存模型的表现中未观察到统计差异。最好的分类器显示出0.65至0.78之间的精度。结论NNU-NET在自动分割肺部病变中的有希望的作用已得到证实，从而大大降低了时必的医生的工作量，而不会损害基于放射线学的生存预测模型的准确性。

translated by 谷歌翻译